Workdocumentation 2022-08-23

From BITPlan ceur-ws Wiki
Jump to navigation Jump to search

Participants

Agenda

new server

New Server

qlever dblp

clone qlever-control

wf@wikidata:/hd/mantax/qlever$ git clone https://github.com/ad-freiburg/qlever-control
Cloning into 'qlever-control'...
remote: Enumerating objects: 399, done.
remote: Counting objects: 100% (239/239), done.
remote: Compressing objects: 100% (150/150), done.
remote: Total 399 (delta 94), reused 211 (delta 88), pack-reused 160
Receiving objects: 100% (399/399), 125.08 KiB | 7.36 MiB/s, done.
Resolving deltas: 100% (149/149), done.

qlever dblp

wf@wikidata:/hd/mantax/qlever$ mkdir dblp
wf@wikidata:/hd/mantax/qlever$ cd dblp
wf@wikidata:/hd/mantax/qlever/dblp$ . ../qlever-control/qlever dblp

QLEVER CONFIG

Checking your PATH ...
Added the directory "/hd/mantax/qlever/qlever-control" to your PATH

Setting up bash autocompletion ...
Done, number of completions: 35

Creating new Qleverfile ...
No pre-configuration name specified (as argument of ". qlever"). Copied default
Qleverfile to current directory, please edit and check.

Setup is complete
Type qlever and use autocompletion to see which actions are available. Add a
"show" in the end to see what an action does without executing it (for example,
qlever index show). Edit your local Qleverfile to change settings. A typical
sequence of actions if you have used a preconfigured Qleverfile is:

qlever get-data
qlever index
qlever start
qlever example-query

qlever dblp get-data

qlever get-data

This is the "qlever" script, call without argument for help

Executing "get-data":

wget -nc -O dblp.ttl.gz https://dblp.org/rdf/dblp.ttl.gz

Getting data using GET_DATA_CMD from Qleverfile ...

--2022-08-24 08:42:22--  https://dblp.org/rdf/dblp.ttl.gz
Resolving dblp.org (dblp.org)... 192.76.146.204
Connecting to dblp.org (dblp.org)|192.76.146.204|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1068155173 (1019M) [application/x-gzip]
Saving to: ‘dblp.ttl.gz’

dblp.ttl.gz         100%[===================>]   1019M  39.2MB/s    in 30s     

2022-08-24 08:42:52 (33.8 MB/s) - ‘dblp.ttl.gz’ saved [1068155173/1068155173]

qlever dblp index

qlever index

This is the "qlever" script, call without argument for help

Executing "index":

docker run -it --rm -u 10000:10000 -v /hd/mantax/qlever/dblp:/index -w /index --entrypoint bash --name qlever.dblp.index-build adfreiburg/qlever -c "zcat dblp.ttl.gz | IndexBuilderMain -F ttl -f - -i dblp -s dblp.settings.json --text-words-from-literals | tee dblp.index-log.txt"

Unable to find image 'adfreiburg/qlever:latest' locally
latest: Pulling from adfreiburg/qlever
125a6e411906: Pull complete 
7c46a5754a97: Pull complete 
ac188e4fc015: Pull complete 
e574534926c8: Pull complete 
e013d756805d: Pull complete 
e839a7c7b682: Pull complete 
072cb0a0501c: Pull complete 
a4a3efb708f4: Pull complete 
Digest: sha256:23ddc354c1b85d85ce56e0659875b7c9a89bc4c6778f3d7e91140d882c681bec
Status: Downloaded newer image for adfreiburg/qlever:latest
2022-08-24 06:50:21.033	- INFO:  QLever IndexBuilder, compiled on Wed Aug 24 00:11:41 UTC 2022 using git hash 3d1a56
2022-08-24 06:50:21.034	- INFO:  You specified the input format: TTL
2022-08-24 06:50:21.034	- INFO:  Locale was not specified in settings file, default is en_US
2022-08-24 06:50:21.034	- INFO:  You specified "locale = en_US" and "ignore-punctuation = 0"
2022-08-24 06:50:21.035	- INFO:  You specified "num-triples-per-batch = 5,000,000", choose a lower value if the index builder runs out of memory
2022-08-24 06:50:21.035	- INFO:  Integers that cannot be represented by QLever will throw an exception (this is the default behavior)
2022-08-24 06:50:21.035	- INFO:  Processing input triples from /dev/stdin ...
2022-08-24 06:51:50.419	- INFO:  Input triples processed: 100,000,000
2022-08-24 06:53:13.171	- INFO:  Input triples processed: 200,000,000
2022-08-24 06:56:32.749	- INFO:  Triples converted: 200,000,000
2022-08-24 06:56:42.036	- INFO:  Done, total number of triples converted: 264,910,951
2022-08-24 06:56:42.038	- INFO:  Building prefix tree from internal vocabulary ...
2022-08-24 06:57:07.519	- INFO:  Computing maximally compressing prefixes (greedy algorithm) ...
2022-08-24 06:58:08.866	- INFO:  Reduction of size of internal vocabulary: 29%
2022-08-24 06:58:11.061	- INFO:  Writing compressed vocabulary to disk ...
2022-08-24 06:58:42.020	- INFO:  Creating a pair of index permutations ... 
2022-08-24 06:59:54.815	- INFO:  Statistics for PSO: #relations = 65, #blocks = 523, #triples = 259,133,822
2022-08-24 06:59:54.815	- INFO:  Statistics for POS: #relations = 65, #blocks = 523, #triples = 259,133,822
2022-08-24 06:59:54.815	- INFO:  Exchanging multiplicities for PSO and POS ...
2022-08-24 06:59:54.815	- INFO:  Writing meta data for PSO and POS ...
2022-08-24 06:59:58.757	- INFO:  Creating a pair of index permutations ... 
2022-08-24 07:00:52.468	- INFO:  Statistics for SPO: #relations = 44,889,872, #blocks = 330, #triples = 259,133,822
2022-08-24 07:00:52.468	- INFO:  Statistics for SOP: #relations = 44,889,872, #blocks = 330, #triples = 259,133,822
2022-08-24 07:00:52.468	- INFO:  Exchanging multiplicities for SPO and SOP ...
2022-08-24 07:01:00.962	- INFO:  Writing meta data for SPO and SOP ...
2022-08-24 07:01:01.102	- INFO:  Number of distinct patterns: 1,278
2022-08-24 07:01:01.102	- INFO:  Number of subjects with pattern: 44,889,872 [all]
2022-08-24 07:01:01.102	- INFO:  Total number of distinct subject-predicate pairs: 208,837,094
2022-08-24 07:01:01.102	- INFO:  Average number of predicates per subject: 4.7
2022-08-24 07:01:01.102	- INFO:  Average number of subjects per predicate: 3,314,875
2022-08-24 07:01:05.633	- INFO:  Creating a pair of index permutations ... 
2022-08-24 07:01:55.634	- INFO:  Statistics for OSP: #relations = 85,991,087, #blocks = 417, #triples = 259,133,822
2022-08-24 07:01:55.634	- INFO:  Statistics for OPS: #relations = 85,991,087, #blocks = 417, #triples = 259,133,822
2022-08-24 07:01:55.634	- INFO:  Exchanging multiplicities for OSP and OPS ...
2022-08-24 07:02:13.771	- INFO:  Writing meta data for OSP and OPS ...
2022-08-24 07:02:13.938	- INFO:  Index build completed
2022-08-24 07:02:14.108	- INFO:  
2022-08-24 07:02:14.108	- INFO:  Adding text index ...
2022-08-24 07:02:14.108	- INFO:  Considering each literal as a text record
2022-08-24 07:02:14.109	- INFO:  The git hash used to build this index was 3d1a56
2022-08-24 07:02:14.109	- INFO:  Reading vocabulary from file dblp.vocabulary.internal ...
2022-08-24 07:02:15.628	- INFO:  Done, number of words: 92,198,728
2022-08-24 07:02:15.628	- INFO:  Number of words in external vocabulary: 3
2022-08-24 07:02:15.628	- INFO:  Building text vocabulary ...
2022-08-24 07:02:54.328	- INFO:  Writing vocabulary to file dblp.text.vocabulary ...
2022-08-24 07:02:54.404	- INFO:  Done, number of words: 9,473,027
2022-08-24 07:02:54.445	- INFO:  Building the half-inverted index lists ...
2022-08-24 07:11:33.076	- INFO:  Statistics for text index: #records = 32,082,807, #words = 257,295,941, #entities = 32,082,807, #blocks = 32,309,703
2022-08-24 07:11:34.814	- INFO:  Text index build completed